AdvGLUE
The Adversarial GLUE Benchmark
Performance of RoBERTa (single model) on AdvGLUE
Performance of RoBERTa (single model) on each task
The Stanford Sentiment Treebank (SST-2)
Quora Question Pairs (QQP)
MultiNLI (MNLI) mismatched
Recognizing Textual Entailment (RTE)